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ON ADAPTIVE INFERENCE AND CONFIDENCE BANDS 

By Marc Hoffmann and Richard Nickl 

ENSAE-CREST and University of Cambridge 

The problem of existence of adaptive confidence bands for an un- 
known density / that belongs to a nested scale of Holder classes over 
R or [0, 1] is considered. Whereas honest adaptive inference in this 
problem is impossible already for a pair of Holder balls £(r), E(s), r =^ 
s, of fixed radius, a nonparametric distinguishability condition is in- 
troduced under which adaptive confidence bands can be shown to 
exist. It is further shown that this condition is necessary and suffi- 
cient for the existence of honest asymptotic confidence bands, and 
that it is strictly weaker than similar analytic conditions recently 
employed in Gine and Nickl [Ann. Statist. 38 (2010) 1122-1170]. The 
exceptional sets for which honest inference is not possible have van- 
ishingly small probability under natural priors on Holder balls S(s). 
If no upper bound for the radius of the Holder balls is known, a price 
for adaptation has to be paid, and near-optimal adaptation is possi- 
ble for standard procedures. The implications of these findings for a 
general theory of adaptive inference are discussed. 

1. Introduction. One of the intriguing problems in the paradigm of adap- 
tive nonparametric function estimation as developed in the last two decades 
is what one could call the "hiatus" between estimation and inference, or, 
to be more precise, between the existence of adaptive risk bounds and the 
nonexistence of adaptive confidence statements. In a nutshell the typical 
situation in nonparametric statistics could be described as follows: one is 
interested in a functional parameter / that could belong either to £ or to 
two sets that can be distinguished by a certain "structural property," 
such as smoothness, with the possibility that EcE'. Based on a sample 
whose distribution depends on /, one aims to find a statistical procedure 
that adapts to the unknown structural property, that is, that performs op- 
timally without having to know whether / € £ or /gE'. Now while such 
procedures can often be proved to exist, the statistician cannot take advan- 
tage of this optimality for inference: To cite Robins and van der Vaart [29], 
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"An adaptive estimator can adapt to an underlying model, but does not 
reveal which model it adapts to, with the consequence that nonparamet- 
ric confidence sets are necessarily much larger than the actual discrepancy 
between an adaptive estimator and the true parameter." 

We argue in this article that adaptive inference is possible if the struc- 
tural property that defines £ and £' is statistically identifiable, by which we 
shall mean here that the nonparametric hypotheses Hq : f G £ and H\ : f G 
£' \ £ are asymptotically consistently distinguishable (in the sense of Ing- 
ster [16-18]). In common adaptation problems this will necessitate that cer- 
tain unidentified parts of the parameter space be removed, in other words, 
that the alternative hypothesis Hi be restricted to a subset £ of £' \ £. 
One is in turn interested in choosing £ as large as possible, which amounts 
to imposing minimal identifiability conditions on the parameter space. We 
shall make these ideas rigorous in one key example of adaptive inference: 
confidence bands for nonparametric density functions / that adapt to the 
unknown smoothness of /. The general approach, however, is not specific 
to this example as we shall argue at the end of this introduction, and the 
heuristic mentioned above is valid more generally. 

The interest in the example of confidence bands comes partly from the 
fact that the discrepancy between estimation and inference in this case is 
particularly pronounced. Let us highlight the basic problem in a simple 
"toy adaptation" problem. Consider X±, . . . ,X n independent and identically 
distributed random variables taking values in [0, 1] with common probability 
density function / and joint law Pr/. We are interested in the existence of 
confidence bands for / that are adaptive over two nested balls in the classical 
Holder spaces C s ([0, 1]) C C r ([0,l]), s > r, of smooth functions with norm 
given by || • [| s <x>> see Definition 1 below. Define the class of densities 

(1.1) £(s) := S( fl ,B) = {/: [0,1] -> [0, oo), J f(x) dx = 1, ||/|| s>00 < b\ 

and note that £(s) C £(r) for s > r. We shall assume throughout that B > 1 
to ensure that £(s) is nonempty. 

A confidence band C n = C n (Xi, . . . , X n ) is a family of random intervals 

{C n {y) = [Cn(y),^ n (y)]}ye[0,l] 

that contains graphs of densities /:[0, 1] — > [0, oo). We denote by |C n | = 
su Pj/g[o,i] \ c 'n(y) ~ c n(y)\ the maximal diameter of C n . Following Li [24] the 
band C n is called asymptotically honest with level a for a family of proba- 
bility densities V if it satisfies the asymptotic coverage inequality 

(1.2) liminf mf Pr/(/(y) € C n (y) Vy G [0, 1]) > 1 - a. 

We shall usually only write Pr/(/ G C n ) for the coverage probability if no 
confusion may arise. Note that V may (and later typically will have to) 
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depend on the sample size n. Suppose the goal is to find a confidence band 
that is honest for the class 

P al1 := S(s) U S(r) = S(r) 

and that is simultaneously adaptive in the sense that the expected diameter 
Ef\C n \ of C n satisfies, for every n (large enough), 

(1.3) sup Ef\C n \ < Lr n (s), sup E f \C n \ < Lr n (r), 

fes( s ) /es(r) 

where L is a finite constant independent of n and where 

-i \ s/(2s+l) 

log n x ' v 



r n {s) 



n 



Indeed even if s were known no band could have expected diameter of smaller 
order than r n (s) uniformly over S(s) (e.g., Proposition 1 below), so that we 
are looking for a band that is asymptotically honest for V an and that shrinks 
at the fastest possible rate over £(s) and S(r) simultaneously. It follows from 
Theorem 2 in Low [26] (see also [4, 8]) that such bands do not exist. 

Theorem 1 (Low). Any confidence band C n that is honest over "P al1 
with level a < 1 necessarily satisfies 

EAC n \ 
lim sup — — —— = oo. 

n /es( s ) r n {s) 

The puzzling fact is that this is in stark contrast to the situation in 
estimation: adaptive estimators f n such as those based on Lepski's method 
[23] or wavelet thresholding [7] can be shown to satisfy simultaneously 

sup Ef\\f n -f\\ O0 = 0{r n {s)), sup £/||/„- Z||oo = OMr)); 
fes( s ) /es(r-) 

see [10, 11, 13] and Theorem 5 below. So while f n adapts to the unknown 
smoothness s, Theorem 1 reflects the fact that knowledge of the smoothness 
is still not accessible for the statistician. 

Should we therefore abstain from using adaptive estimators such as f n 
for inference? Gine and Nickl [12] recently suggested a new approach to this 
problem, partly inspired by Picard and Tribouley [28]. In [12] it was shown 
that one can construct confidence bands C n and subsets S(e,r) C S(r), 
defined by a concrete analytical condition that involves the constant e > 0, 
such that C n is asymptotically honest for 

P £ = S(s)US(e,r) 



4 



M. HOFFMANN AND R. NICKL 



for every fixed e > 0, and such that C n is adaptive in the sense of (1.3). 
Moreover, these subsets were shown to be topologically generic in the sense 
that the set 

{/ G S(r) but / $ £(e,r) for any e > 0} 

that was removed is nowhere dense in the Holder norm topology of C r (in fact 
in the relevant trace topology on densities). This says that the functions / € 
"P al1 that prevent adaptation in Theorem 1 are in a certain sense negligible. 

In this article we shall give a more statistical interpretation of when, and 
if, why, adaptive inference is possible over certain subsets of Holder classes. 
Our approach will also shed new light on why adaptation is possible over 
the sets S(e,r). Define, for s > r, the following class: 

(1.4) Z(r,p n ):=E(r,s,p n ,B) = \feZ(r,B): inf \\g - > p n \, 

where p n is a sequence of nonnegative real numbers. Clearly S(r, 0) = S(r), 
but if p n > 0, then we are removing those elements from E(r) that are not 
separated away from S(s) in sup- norm distance by at least p n . Inspection 
of the proof of Theorem 2 shows that the set removed from E(r) \ S(s) is 
nonempty as soon as p n > 0. 

Similar to above we are interested in finding a confidence band that is 
honest over the class 

%,):=E( S )US(r lPn ), 
and that is adaptive in the sense of (1.3), in fact only in the sense that 

(1.5) sup Ef\C n \ < Lr n (s), sup E f \C n \ < Lr n (r) 
f£Z(s) fet(r, Pn ) 

for every n (large enough). We know from Low's results that this is impos- 
sible if p n = 0, but the question arises as to whether this changes if p n > 0, 
and if so, what the smallest admissible choice for p n is. 

It was already noted or implicitly used in [1, 5, 15, 19, 29] that there is 
a generic connection between adaptive confidence sets and minimax distin- 
guishability of certain nonparametric hypotheses. In our setting consider, 
for instance, testing the hypothesis 

Ho '■ fo = 1 against Hi : f G M, M finite, M C S(r, p n ). 

As we shall see in the proof of Theorem 2 below, an adaptive confidence 
band over V{p n ) can be used to test any such hypothesis consistently, and 
intuitively speaking an adaptive confidence band should thus only exist if 
p n is of larger order than the minimax rate of testing between Hq and Hi 
in the sense of Ingster [16, 17]; see also the monograph [18]. For confidence 
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bands a natural separation metric is the supremum-norm (see, however, also 
the discussion in the last paragraph of the Introduction), and an exploration 
of the corresponding testing problems gives our main result, which confirms 
this intuition and shows moreover that this lower bound is sharp up to 
constants at least in the case where B is known. 

Theorem 2. Let s > r > 0. An adaptive and honest confidence band 
over 

E(s)uE(r,p n ) 

exists if and only if p n is greater than or equal to the minimax rate of testing 
between Hq : /o € E(s) and H\ : /o £ E(r,p n ), and this rate equals r n (r). More 
precisely: 

(a) Suppose that C n is a confidence band that is asymptotically honest 
with level a < 0.5, over E(s) U E(r, p n ) and that is adaptive in the sense of 
(1-5). Then necessarily 

liminf ^" > 0. 

n r n (r) 

(b) Suppose B,r,s and < a < 1 are given. Then there exists a sequence 
p n satisfying 

lim sup — t— r < oo 

n r n [r) 

and a confidence band C n = C n (B, r, s, a; X±, . . . , X n ) that is asymptotically 
honest with level a and adaptive over UE(r,p„) in the sense of (1.5). 

(c) Claims (a) and (b) still hold true «/E(s) is replaced by the set 

|/€S(a), inf J|<7-/||oo>Br n (s)/2} 

for any t> s. 

The last claim shows that the situation does not change if one removes 
similar subsets from the smaller Holder ball S(s), in particular removing the 
standard null-hypothesis /o = 1 used in the nonparametric testing literature, 
or other very smooth densities, cannot improve the lower bound for p n . 

Part (b) of Theorem 2 implies the following somewhat curious corollary: 
since any / £ E(r) \ E(s) satisfies inf seS ( s ) ||<? — fW^ > (note that E(s) 

is || • ||oo-compact), we conclude that / £ E(r, Lr n {r)) for every L > 0, n > 
no(/, r, L) large enough. We thus have: 
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Corollary 1. There exists a "dishonest" adaptive confidence band C n := 
C n (B, r, s, a; X\, . . . , X n ) that has asymptotic coverage for every fixed f € 
■p al1 ; that is, C n satisfies 

liminfPr f (/GC n )>l-a V/ G P al1 

n 

and 

/gS(s) => E f \C n \ = 0(r n (s)), 

/€S(r) =► E f \C n \=0(r n {r)). 

A comparison to Theorem 1 highlights the subtle difference between the 
minimax paradigm and asymptotic results that hold pointwise in /: if one 
relaxes "honesty," that is, if one removes the infimum in (1.2), then Low's 
impossibility result completely disappears. Note, however, that the index n 
from which onwards coverage holds in Corollary 1 depends on /, so that the 
asymptotic result cannot be confidently used for inference at a fixed sample 
size. This is a reflection of the often neglected fact that asymptotic results 
that are pointwise in / have to be used with care for statistical inference; 
see [3, 22] for related situations of this kind. 

In contrast to the possibly misleading conclusion of Corollary 1, Theorem 
2 characterizes the boundaries of "honest" adaptive inference, and several 
questions arise. 

(i) What is the relationship between the sets S(r, p n ) from Theorem 2 
and the classes S(e,r) considered in [12]? Moreover, is there a "Bayesian" 
interpretation of the exceptional sets that complements the topological one? 

(ii) The typical adaptation problem is not one over two classes, but over a 
scale of classes indexed by a possibly continuous smoothness parameter. Can 
one extend Theorem 2 to such a setting and formulate natural, necessary 
and sufficient conditions for the existence of confidence bands that adapt 
over a continuous scale of Holder classes? 

(iii) Can one construct "practical" adaptive nonparametric confidence 
bands? For instance, can one use bands that are centered at wavelet or kernel 
estimators with data-driven bandwidths? In particular can one circumvent 
having to know the radius B of the Holder balls in the construction of the 
bands? 

We shall give some answers to these questions in the remainder of the 
article, and summarize our main findings here. 

About question (i): we show in Proposition 3 that the "statistical" sepa- 
ration of S(r) and X(s) using the sup-norm distance as in (1.4) enforces a 
weaker condition on / € S(r) than the analytic approach in [12], so that the 
present results are strictly more general for fixed smoothness parameters s. 
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We then move on to give a Bayesian interpretation of the classes E(r, p n ) 
and S(e,r): we show in Proposition 4 that a natural Bayesian prior arising 
from "uniformly" distributing suitably scaled wavelets on S(r) concentrates 
on the classes £(r, p n ) and E(e, r) with overwhelming probability. 

About question (ii): if the radius -B of the Holder balls involved is known, 
then one can combine a natural testing approach with recent results in 
[10, 11, 13] to prove the existence of adaptive nonparametric confidence 
bands over a scale of Holder classes indexed by a grid of smoothness param- 
eters that grows dense in any fixed interval [r,R] C (0,oo) as n — > oo; see 
Theorems 3, 4. 

A full answer to question (iii) lies beyond the scope of this paper. Some 
partial findings that seem of interest are the following: note first that our 
results imply that the logarithmic penalties that occurred in the diameters 
of the adaptive confidence bands in [12] are not necessary if one knows the 
radius B. On the other hand we show in Proposition 1 that if the radius B 
is unknown, then a certain price in the rate of convergence of the confidence 
band cannot be circumvented, as B cannot reliably be estimated without 
additional assumptions on the model. This partly justifies the practice of 
undersmoothing in the construction of confidence bands, dating back to 
Bickel and Rosenblatt [2] . It leads us to argue that near-adaptive confidence 
bands that can be used in practice, and that do not require the knowledge 
of B, are more likely to follow from the classical adaptive techniques, like 
Lepski's method applied to classical kernel or wavelet estimators, rather 
than from the "testing approach" that we employ here to prove existence of 
optimal procedures. 

To conclude: the question as to whether adaptive methods should be used 
for inference clearly remains a "philosophical" one, but we believe that our 
results shed new light on the problem. That full adaptive inference is not 
possible is a consequence of the fact that the typical smoothness classes over 
which one wants to adapt, such as Holder balls, contain elements that are 
indistinguishable from a testing point of view. On the other hand Holder 
spaces are used by statisticians to model regularity properties of unknown 
functions /, and it may seem sensible to exclude functions whose regularity 
is not statistically identifiable. Our main results give minimal identifiability 
conditions of a certain kind that apply in this particular case. 

Our findings apply also more generally to the adaptation problem dis- 
cussed at the beginning of this introduction with two abstract classes £,£'. 
We are primarily interested in confidence statements that Cai and Low [4] 
coin strongly adaptive (see Section 2.2 in their paper) and in our case this cor- 
responds precisely to requiring (1.2) and (1.3). If S, £' are convex, and if one 
is interested in a confidence interval for a linear functional of the unknown 
parameter, Cai and Low show that whether strong adaptation is possible or 
not is related to the so-called "inter-class modulus" between S, and their 
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results imply that in several relevant adaptation problems strongly adap- 
tive confidence statements are impossible. The "separation-approach" put 
forward in the present article (following [12]) shows how strong adaptation 
can be rendered possible at the expense of imposing statistical identifiability 
conditions on £, as follows: one first proves existence of a risk-adaptive 
estimator /„ over in some relevant loss function. Subsequently one 

chooses a functional F:Ex £'—>[(), oo), defines the nonparametric model 



and derives the minimax rate p n of testing Hq : / G £ against the generally 
nonconvex alternative {/gS' \ S : inf 9 <=£ ¥(g, f) > p n }. Combining consis- 
tent tests for these hypotheses with f n allows for the construction of con- 
fidence statements under sharp conditions on p n . A merit of this approach 
is that the resulting confidence statements are naturally compatible with 
the statistical accuracy of the adaptive estimator used in the first place. An 
important question in this context, which is beyond the scope of the present 
paper, is the optimal choice of the functional F: for confidence bands it seems 
natural to take ¥(f,g) = \\f — g\\oo, but formalizing this heuristic appears 
not to be straightforward. In more general settings it may be less obvious 
to choose F. These remain interesting directions for future research. 

2. Proof of Theorem 2 and further results. Let X\,..., X n be i.i.d. with 
probability density / on T which we shall take to equal either T = [0,1] 
or T = R. We shall use basic wavelet theory [6, 14, 27] freely throughout 
this article, and we shall say that the wavelet basis is S-regular if the cor- 
responding scaling functions <p k and wavelets ip k are compactly supported 
and <S-times continuously differentiable on T. For instance, we can take 
Daubechies wavelets of sufficiently large order N = N(S) on T = R (see 
[27]) or on T = [0, 1] (Section 4 in [6]). 

We define Holder spaces in terms of the moduli of the wavelet coef- 
ficients of continuous functions. The wavelet basis consists of the trans- 
lated scaling functions (f) k and wavelets tpi k = 2 l / 2 ip k (2 l (-)), where we add 
the boundary corrected scaling functions and wavelets in case T = [0, 1]. If 
T = R the indices k, I satisfy I G N U {0}, k G Z, but if T = [0, 1] we require 
I > Jo for some fixed integer Jo = Jq(N) and then k = 1, . . . ,2 l for the ipik's, 
k = 1, . . . ,N < oo for the <£ fc 's. Note that ipi k = 2 1 / 2 i(j(2 1 (-) - k) for a fixed 
wavelet ip if either T = R or if ipik is supported in the interior of [0, 1] . Write 
shorthand a k (h) = J h(p k , j3i k (h) = J hipik- 

Definition 1. Denote by C(T) the space of bounded continuous real- 
valued functions on T, and let 4> k and ip k be S-regular Daubechies scaling 
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and wavelet functions, respectively. For s < S, the Holder space C S {T) (=C S 
when no confusion may arise) is defined as the set of functions 

{/ € C(T) : H/IUoo = max(sup |a fe (/)|,sup2 z ( s+1 / 2 )|Aifc(/)|) < 00} . 

Define, moreover, for s > 0, B > 1, the class of densities 

(2.1) ^s) :=^s,B,T) = tf :T ^[0,oc), ^f(x)dx = l,\\f\\ St00 < b\. 

It is a standard result in wavelet theory (Chapter 6.4 in [27] for T = M and 
Theorem 4.4 in [6] for T = [0, 1]) that C s is equal, with equivalent norms, to 
the classical Holder-Zygmund spaces C s . For T = R, < s < 1, these spaces 
consist of all functions / € C(R) for which ||/||oo + sw Px^y,x,yeR 
f(y)\/\x — y\ s ) is finite. For noninteger s > 1 the space C s is defined by 
requiring D^f of / € C(R) to exist and to be contained in C s_ ^. The 
Zygmund class C 1 is defined by requiring \f(x + y) + f{x — y) — 2f{x) \ < C\y\ 
for all x, y G M, some < C < 00 and / G C(R), and the case m < s < m + 1 
follows by requiring the same condition on the mth derivative of /. The 
definitions for T = [0, 1] are similar; we refer to [6]. 

Define the projection kernel K(x,y) = s }2k<Pk{ x )4>k{y) an d write 

Kj(f)(x) = 2? J K($x, Vy)f{y) dy 

i-i 

k l=Jo k 

for the partial sum of the wavelet series of a function / at resolution level 
j > ^0 + 1) with the convention that Jq = if T = R. 

If are i.i.d. ~/ then an unbiased estimate of Kj(f) is, for 

a k = (1/n) Ya=\ 4>k{Xi),Pi k = (1/n) ^"=1 ^lk( x i) tn e empirical wavelet co- 
efficients, 

(2.2) / n (x, j) = - K(Vx, 2>Xi) = 

i=l k l=Jo k 

2.1. Proof of Theorem 2. We shall first prove Theorem 2 to lay out the 
main ideas. We shall prove claims (a) and (b), that this also solves the testing 
problem Hq : /o £ against Hi : /q € E(r, p n ) follows from the proofs. The 
proof of claim (c) is postponed to Section 3. Let us assume B > 2 to simplify 
some notation. Take j* € N such that 



l/(2r+l) 
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is satisfied, where ~ denotes two-sided inequalities up to universal constants. 

(<=): Let us show that liminf n (/3 n /r n (r)) = leads to a contradiction. In 
this case Pn/r n {r) — > along a subsequence of n, and we shall still index this 
subsequence by n. Let /o = 1 on [0, 1] and define, for e > 0, the functions 

f m :=f + e2~^ +1 / 2 ^ jm , 

where m = 1, . . . , M, cq2 3 < M < 2 3 , j > 0, cq > 0, and where ip is a Daubechies 
wavelet of regularity greater than s, chosen in such a way that ipj m is sup- 
ported in the interior of [0, 1] for every m and j large enough. (This is pos- 
sible using the construction in Theorem 4.4 in [6].) Since f Q ip = we have 
Jq fm = 1 f° r every m and also f m >0 Vm if e > is chosen small enough 
depending only on Halloo- Moreover, for any t > 0, using the definition of 
|| • \\ t)00 and since c(0) = sup fc | f Q <j> k \ < sup fc \\(f)kh = 1> 

(2.3) ||/ m || t)0O =max(c(^),e2^*- r )), m = l,...,M, 

so f m € S(r) for e < 2 (recall B >2) and every j but / m ^ E(s) for j large 
enough depending only on s,r,B,e. 
Note next that 



2 i/2 / i> k (2 l x)h(x)dx 



<2- z / 2 ||^||ill/ i |l^<2-'/ 2 



oc 



for every I, k, and any bounded function h implies 

(2.4) H^Hoo > sup2'/ 2 |A fc W| 

l>0,k 

so that, for g € S(s) arbitrary, 

||/m -5||oc > SUp 2'/ 2 |ft fc (/ m ) - 
i>0,fc 

(2.5) > e2~ jr - 2 j l 2 \f3 jk {g)\ > e2~ jr - B2~ js 

> -2- jr 
~ 2 

for every m and for j > jo, Jo = jo(s, r, B, e). Summarizing we see that 
f m e£[r, £ -2-A Vm = l,...,M 



for every j > jq. Since p n = o(r n (r)), r n (r) ~ 2 • 7 " r , we can find j n > j* such 
that 

(2.6) p' n :=max(p n ,r n (s)logn) < £ -2~^ r = o(2~^ r ) 

in particular f m £ S(r, p' n ) for every m = 1, . . . , M and every n > no, no = 
no(s,r,JB,e). 
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Suppose now C n is a confidence band that is adaptive and honest over 
E(s) U E(r, p n ), and consider testing 

H :f = f against iTi : / G {/i, . . . , f M } =■ M. 

Define a test *ff n as follows: if no f m G C n , then \& n = 0, but as soon as one of 
the / m 's is contained in C n , then ty n = 1. We control the error probabilities 
of this test. Using (2.5), Markov's inequality, adaptivity of the band, (2.6) 
and noting r n (s) = o(p' n ), we deduce 

Pr /o (^ n / 0) = Pr/ (/ m G C n for some m) 

= Pr /o (/ m , f G C n for some m) 

+ Pr /o (/ m G C n for some m, / ^ C re ) 

< Pr /o (||/ m - /olloo < |C„| for some m) + a + o(l) 
<Pr f0 (\C n \>p' n ) + a + o(l) 

< E fo \C n \/p' n + a + o(l) = a + o(l). 

Under any alternative f m G Ti{r } p' n ), invoking honesty of the band we have 

Pf m (*n = 0) = Pr /m (no h G C n ) < Pr /m (/ m £ C n ) < a + o(l) 
so that summarizing we have 



(2.7) limsup sup£/(l-# n ) < 2a < 1. 

On the other hand, if is any test (any measurable function of the sample 
taking values or 1), we shall now prove 

(2.8) liminf inf (E fo ^ + sup E f (l - §)) > 1, 

n § \ j eM J 

which contradicts (2.7) and completes this direction of the proof. The proof 
follows ideas in [16]. We have, for every r/ > 0, 

1 M 

E h m + sup E f (l -9)> E f0 (l{V = 1}) + — Efjl ~ *) 

>£ /o (l{* = l} + l{* = 0}Z) 
>(l- V )Pi fo (Z>l- V ), 

where Z = M _1 J2m=i(dPm/dPo) with P™ the product probability mea- 
sures induced by a sample of size n from the density f m . By Markov's 
inequality, 



E fn \Z-l\ JE f JZ-l)' 2 

Pr fn (Z>l-77 >1 ^ ->1- V /tA - 

7? ?7 
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for every r\ > 0, and we show that the last term converges to zero. Writing 
(in abuse of notation) jj = e2~^ n<yr+l / 2 ^ , using independence, orthonormality 
of ipj m and J ipj m = repeatedly as well as (1 + x) < e x , we see 




M / n 




/ 7(1 +-fj^ jm (xi)) 2 dx - 1 
(1 +"i j 4i jm (x)) 2 dx 



1 M 
M2 ^ 

m=l 



M 2 ^ 1 

m=l 



[0,1] 

1 p nT J - 

^((1 + ,?)"-!)< — 



Now using (2.6) we see nj 2 = e 2 n2 ■7™( 2r + 1 ) = o(logn) so that e n7 J = o(n K ) 
for every k > 0, whereas M ~ 2 Jn > 2- ? ™ ~ r n (r) _1 / r still diverges at a fixed 
polynomial rate in n, so that the last quantity converges to zero, which 
proves (2.8) since rj was arbitrary. 

(=^): Let us now show that an adaptive band C n can be constructed if p n 
equals r n (r) times a large enough constant, and if the radius B is known. 
The remarks after Definition 1 imply that ||/||oo < &||/||s,oc < kB for some 
k > 0. Set 



(2.9) a(j):=a(n,j):=\/kB^, p n := L'a(f n ) ~r n (r) 

V Th 

for L' a constant to be chosen later. Using Definition 1 and sup x £\ |^fc(x)| < 
oo, we have for f n from (2.2) based on wavelets of regularity S > s 

(2.10) \\E f f n (f n ) - /|U = ||^-*(/) - /|U < 6 2-^ r ' < ba(j* n ) 

for some constants 6o>^ that depend only on B,ip. 

Define the test statistic d n :=inf 9gS ( s) \\f n (jn) ~ dWoo- Let now f n (y) be 
any estimator for / that is exact rate adaptive over S(s) U S(r) in sup- norm 
risk; that is, f n satisfies simultaneously, for some fixed constant D depending 
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only on B, s, r 

(2.11) sup EfWfn-fW^KDrnir), sup E f \\f n - f]^ < Dr n (s). 
/6S(r) /GS(s) 

Such estimators exist; see Theorem 5 below. Define the confidence band 
C n = {C n (y),y € [0, 1]} to equal 

/ n (y) ± Lr n (r) if d„ > r and ± Lr n (s) if d n <T,y£ [0, 1], 

where r = k<t(j*), and where k and L are constants to be chosen below. 

We first prove that C n is an honest confidence band for f £ ^(s) U 5j(7", p^) 
when p n is as above with L' large enough depending only on k, B. If / € S(s) 
we have coverage since adaptivity of f n implies, by Markov's inequality, 

inf Pr/(/ 6 g>l- sup Pr / (||/ n -/|| 0O >Lr n (a)) 

/eS(s) 

- 1 -J7J7) sup E f\\f»-fW°° 

which can be made greater than I — a for any a > by choosing L large 
enough depending only on K, B, a, r, s. When / £ £(r, p n ) there is the danger 
of d n < t in which case the size of the band is too small. In this case, however, 
we have, using again Markov's inequality, 

• f P ( fc n W1 SUP /gS(r,p„)£/Hin-/lloo p . . 

mf Pr / (/€C n )>l p / sup Pr / (d n <r) 

/6S(r,p n ) ^r n (r) /€£(r,p„) 

and the first term subtracted can be made smaller than a for L large enough 
in view of (2.11). For the second note that Pif(d n < r) equals, for every 
/ £ t(r,p n ), 

Pr /( ™f , ll/n(j'n) - 9\\oo < K<j{j* n ) 

< P r/ (inf ||/ - - \\Uf n ) - E f f n (j*)\\ c 

-\\Kj*(f)-f \\oo <na(j* 

< Pv f (p n - \\Kj.(f) - /lU - Ka(j*) < \\UjI) - Eff n (ti 

< Pr f (\\f n (f n ) - E f f n U*)\\oo >{L'-K- b)a(f n )) 
<ce- cj " = o(l) 
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for some c > 0, by choosing L' = L'{n,B,K) large enough independent of 
/ € E(r, p n ), in view of Proposition 5 below. This completes the proof of 
coverage of the band. 

We now turn to adaptivity of the band and verify (1.5). By definition of 
C n we have almost surely 

\C n \ < Lr n (r), 

so the case / € S(r, p n ) is proved. If / € S(s) then, using (2.10) and Propo- 
sition 5, 

E f \C n \<Lr n {r)Fr f {d n >T)+Lr n (s) 

< Lr n {r)Vv f ( inf \\f n {j* n ) - slU > + Lr n (s) 

< Lr n (r)Pr f (\\f n (f n ) - /(U > Ko(j* n )) + Lr n {s) 

< Lr n (r)Pr f (\\f n (j* n ) - E f f n (j*)\\cc > (« - b)a(j* n )) + Lr n {s) 

< Lr n (r)ce~ cj " + Lr n (s) = 0(r n (s)) 

since c can be taken sufficiently large by choosing k = k(K, B) large enough. 
This completes the proof of the second claim of Theorem 2. 

2.2. Unknown radius B. The existence results in the previous section 
are not entirely satisfactory in that the bands constructed to prove exis- 
tence of adaptive procedures cannot be easily implemented. Particularly the 
requirement that the radius B of the Holder ball be known is restrictive. A 
first question is whether exact rate-adaptive bands exist if B is unknown, 
and the answer turns out to be no. This in fact is not specific to the adap- 
tive situation, and occurs already for a fixed Holder ball, as the optimal size 
of a confidence band depends on the radius B. The following proposition 
is a simple consequence of the formula for the exact asymptotic minimax 
constant for density estimation in sup-norm loss as derived in [21]. 

Proposition 1. Let X%, . . . ,X n be i.i.d. random variables taking values 
in [0, 1] with density f G £(r, B, [0, 1]) where < r < 1. Let C n be a confi- 
dence band that is asymptotically honest with level a for S(r, B, [0, 1]). Then 

liminf sup — ^ > cB p (l — a) 

n /g£(r,B,[0,l]) r n{r) 

for some fixed constants c,p > that depend only on r. 

In particular if C n does not depend on B, then Ef\C n \ cannot be of order 
r n {r) uniformly over T,(r,B, [0, 1]) for every B > 0, unless B can be reliably 
estimated, which for the full Holder ball is impossible without additional 
assumptions. It can be viewed as one explanation for why undersmoothing 
is necessary to construct "practical" asymptotic confidence bands. 
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2.3. Confidence bands for adaptive estimators. The usual risk-adaptive 
estimators such as those based on Lepski's [23] method or wavelet thresh- 
olding [7] do not require the knowledge of the Holder radius B. As shown 
in [12] (see also [20]) such estimators can be used in the construction of 
(near-)adaptive confidence bands under certain analytic conditions on the 
elements of E(s). Let us briefly describe the results in [12, 20]. Let £ n be a 
sequence of positive integers (typically £ n — > oo as n —¥ oo) and define, for 
K the wavelet projection kernel associated to some 5-regular wavelet basis, 
S>s 

(2.12) S(e, a, l n ) := {/ € E(s) : e2~ ls < \\K t (f) - /IU < B2~ ls VZ > £ n }. 

The conditions in [12, 20] are slightly weaker in that they have to hold only 
for I € [£n,£' n ] where £' n — £ n —> oo. This turns out to be immaterial in what 
follows, however, so we work with these sets to simplify the exposition. 

Whereas the upper bound in (2.12) is automatic for functions in E(s), the 
lower bound is not. However one can show that a lower bound on \\K[(f) — 
/||oc °f order 2~ ls is "topologically" generic in the Holder space C S (T). The 
following is Proposition 4 in [12]. 

Proposition 2. Let K be S-regular with S > s. The set 
{/: there exists no e > 0,/ > s.t. \\Ki(f) - /IU > £ 2~ l ( s+1 W y/ > l } 
is nowhere dense in the norm topology ofC s (R). 

Using this condition, [12] constructed an estimator f n based on Lepski's 
method applied to a kernel or wavelet density estimator such that 



(2.13) A n [ sup 

^6[0,l] 



fn(y) - f(y) 



°n\J fn(y) 



as n — > oo, where Z is a standard Gumbel random variable and where 
n,o~ n are some random constants. If £ n is chosen such that 

(2.14) 2^=i [^-) 

\lognJ 

then the limit theorem (2.13) is uniform in relevant unions over s € [r,R],r > 
0, of Holder classes E(e, s,£ n ). Since the constants A n ,B n ,a n in (2.13) are 
known, confidence bands can be retrieved directly from the limit distribu- 
tion, and [12] further showed that so-constructed bands are near-adaptive: 
they shrink at rate Op(r n (s)u n ) whenever / G T,(e,s,£ n ), where u n can be 
taken of the size logn. See Theorem 1 in [12] for detailed statements. As 
shown in Theorem 4 in [20], the restriction u n ~ logn can be relaxed to 
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u n — > oo as n — > oo, at least if one is not after exact limiting distributions 
but only after asymptotic coverage inequalities, and this matches Proposi- 
tion 1, so that these bands shrink at the optimal rate in the case where B 
is unknown. 

Obviously it is interesting to ask how the sets in (2.12) constructed from 
analytic conditions compare to the classes considered in Theorems 2, 3 and 
4 constructed from statistical separation conditions. The following result 
shows that the conditions in the present paper are strictly weaker than 
those in [12, 20] for the case of two fixed Holder classes, and also gives a 
more statistical explanation of why adaptation is possible over the classes 
from (2.12). 

Proposition 3. Let t> s. 

(a) Suppose f € s,£ n ) for some fixed e > 0. Then inf ffgS ( t ) ||/ — gW^ > 
c2~ inS for some constant c = c(e,B,s,t,K). Moreover, if 2~ inS /r n (s) — > oo 
as n — » oo, so in particular in the adaptive case as in (2.14), then, for every 
L >0, 

E(e,s,£ n ) C t(s,L r n (s)) 

for n>no(e,B,s,t,Lo,K) large enough. 

(b) If £ n is s.t. 2~ inS /r n (s) — > oo as n — > oo, so in particular in the adap- 
tive case (2.14), then VLg > 0,e > the set 

t(s,L' r n (s))\-Z(e,s,l n ) 
is nonempty for n>no(s,t,K,B,L' Q ) large enough. 

2.4. A Bayesian perspective. Instead of analyzing the topological capac- 
ity of the set removed, one can try to quantify its size by some measure 
on the Holder space C s . As there is no translation- invariant measure avail- 
able we consider certain probability measures on C s that have a natural 
interpretation as nonparametric Bayes priors. 

Take any S-regular wavelet basis {4>k,ipik '■ k £ Z, I G N} of L 2 ([0, 1]), S > s. 
The wavelet characterization of C s ([0, 1]) motivates to distribute the basis 
functions ipik's randomly on S(s, B) as follows: take uik i.i.d. uniform random 
variables on [—B,B] and define the random wavelet series 

00 

u.(x) = 1 + Y,Y. 2 ~ l(s+1/2)uik ^ x ^ 

l=J k 

which converges uniformly almost surely. It would be possible to set J = 
and replace 1 by 'Ylk u ok<t ) k below, but to stay within the density framework 
we work with this minor simplification, for which J Q U s (x) dx = 1 as well as 
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U s > almost surely if J = J(||^|| CI0 ,.B, s) is chosen large enough. Conclude 
that U s is a random density that satisfies 

H^slUoo < max( 1, sup \ui k \) < B a.s., 
v k,l>J ' 

so its law is a natural prior on T,(s,B) that uniformly distributes suitably 
scaled wavelets on S(s) around its expectation EU a = 1. 

Proposition 4. Lei if 6e i/ie wavelet projection kernel associated to a 
S-regular wavelet basis 4>,ip of L 2 ([0, 1]), 5 > s, and let e > 0,j > 0. Then 

Pt{\\Kj(U s ) -U s \\oo< eB2~i s } < e~ lo ^ 2J . 

By virtue of part (a) of Proposition 3 the same bound can be established, 
up to constants, for the probability of the sets S(s) \S(s,p re ) under the law 
of U s . 

Similar results (with minor modifications) could be proved if one replaces 
the li/fc's by i.i.d. Gaussians, which leads to measures that have a structure 
similar to Gaussian priors used in Bayesian nonparametrics; see, for example, 
[30]. If we choose j at the natural frequentist rate 2 J ~ n l /( 2s+l \ then the 
bound in Proposition 4 becomes e~ CnS ^ s \ S n (s) = n~ s ^ 2s+1 \ where C > 
can be made as large as desired by choosing e small enough. In view of (2.3) 
in Theorem 2.1 in [9] one could therefore heuristically conclude that the 
exceptional sets are "effective null-sets" from the point of view of Bayesian 
nonparametrics . 

2.5. Adaptive confidence bands for collections of Holder classes. The 
question arises of how Theorem 2 can be extended to adaptation prob- 
lems over collections of Holder classes whose smoothness degree varies in a 
fixed interval [r, R] C (0, oo). A fixed finite number of Holder classes can be 
handled by a straightforward extension of the proof of Theorem 2. Of more 
interest is to consider a continuum of smoothness parameters — adaptive es- 
timators that attain the minimax sup-norm risk over each element of the 
collection |J 0<s<R S(s) exist; see Theorem 5 below. Following Theorem 2 a 

first approach might seem to introduce analogues of the sets S(s,p n ) as 
{/ E S(a) : inf \\g - /|U > p n (s) Vt > *}■ 

However this does not make sense as the sets {E(i)}t >s are || • ||oo-dense 
in S(s), so that so-defined E(s,p n (s)) would be empty [unless p n {s) =0]. 
Rather one should note that any adaptation problem with a continuous 
smoothness parameter s and convergence rates that are polynomial in n 
can be recast as an adaptation problem with a discrete parameter set whose 
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cardinality grows logarithmically in n. Indeed let us dissect [r,R] into |<S n | — 
logn points 

S n :=S n (C) = {si,i = 1, . . . , |cS n |} 

that include r = s\ , R = s\g\, Sj < Sj+i Vi, and each of which has at most 
2£/logn and at least C/logn distance to the next point, where £ > is a 
fixed constant. A simple calculation shows 

(2.15) r n { Si )<Cr n {s) 

for some constant C = C((, R) and every Sj < s < Sj+i, so that any estimator 
that is adaptive over s G S n , is also adaptive over S(s), s G [r, i?]. 
After this discretization we can define 

t(s,p n (s),S n ) = \f eT,(s): inf \\g - fW^ > p n (s) Vt > s,i G 5 n i, 

where p n (s) is a sequence of nonnegative integers. We are interested in the 
existence of adaptive confidence bands over 



E(fl)u( |J t(s, Pn (s),S n )] 



Se5„\{i?} 

under sharp conditions on p n (s). 

Let us first address lower bounds, where we consider T = [0, 1] for sim- 
plicity. Theorem 2 cannot be applied directly since the smoothness index s 
depends on n in the present setting, and any two s, s' G S n could be as close 
as C/logn possibly. If the constant Q is taken large enough (but finite) one 
can prove the following result. 



Theorem 3 (Lower bound). Let T = [0, 1],L > 1 and < a < 1/3 be 
given, and let <S n (C) be a grid as above. Let s < s' be any two points in <S n (C) 
and suppose that C n is a confidence band that is asymptotically honest with 
level a over 

S(s') U t(s,p n (s),S n ), 

and that is adaptive in the sense that 

sup Ef\C n \<Lr n (s'), sup E f \C n \ < Lr n (s) 

/es(s') /e£o,p„(s),s n ) 

for every n large enough. Then if Q := ((R, B, L,a) is a large enough but 
finite constant, we necessarily have 

limi n f^44>0. 

n r n (s) 
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A version of Theorem 3 for T = R can be proved as well, by natural 
modifications of its proof. 

To show that adaptive procedures exist if B is known define 

E„(s) := /eS(s): inf \\g - fW^ > L r n (s) Vt € S n , t > s \, 
L s eE(t) ) 

where s varies in [r,R), and where Lq > 0. Setting S n (i?) = S(i?) for nota- 
tional convenience, we now prove that an adaptive and honest confidence 
band exists, for Lq large enough, over the class 

V n (L ):=V(S n ,B,L ,n): = (J E n (s). 

Analyzing the limit set (as n — > oo) of V n (Lo), or a direct comparison to 
the continuous scale of classes in (2.12), seems difficult, as <S n depends on n 
now. Note, however, that one can always choose {S n } n >i in a nested way, 
and C large enough, such that V n {Lo) contains, for every n, any fixed finite 
union (over s) of sets of the form S(e, s,£ n ) (using Proposition 3). 

Theorem 4 (Existence of adaptive bands). Let X\, . . . ,X n be i.i.d. ran- 
dom variables on T = [0, 1] or T = R with density f 6 V n (Lo) and suppose 
B, r, R, < a < 1 are given. Then, if Lq is large enough depending only on 
B, a confidence band C n = C n (B, r, R, a; X\, . . . , X n ) can be constructed such 
that 

liminf inf Pr f (/ G C n ) > 1 — a 

n feVn(L ) 

and, for every s € S n ,n € N and some constant L' independent of n, 

(2.16) sup E f \C n \<L'r n {s). 

/efi„0) 

3. Proofs of remaining results. 

Proof of Proposition 1. On the events {/ e C n } we can find a ran- 
dom density T n € C n depending only on C n such that {\C n \ <-D,/€C n }C 
{\\T n — / ||oo < D} for any D > 0, and negating this inclusion we have 

{\C n \ > D} U {/ i C n } D{\\T n - /Hoc > D] 

so that Pr/dCnl > D) > Vr f {\\T n - /(U > D) - Pr/(/ ^ C n ). Thus, using 
coverage of the band 

liminf sup Prj ( \C n \ > cB p r n (r)) 
n /6S(r,B) 

> liminf sup Pr/(||T n - /H^ > cB p r n (r)) - a. 

n /GE(r,B) 
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The limit inferior in the last line equals 1 as soon as c > is chosen small 
enough depending only on r,p in view of Theorem 1 in [21]; see also page 
1114 as well as Lemma A. 2 in that paper. Taking liminf's in the inequality 

Et IC I 

sup f \ >cB p sup Pr/(|C n | >cB p r n {r)) 

/GS(r,B) r n[r) /gE(r,B) 

gives the result. □ 

PROOF of Proposition 3. (a) Observe first that for every h>l n , 

|^||oo^2'/ 2 sup|A fe (/)| > \\K h (f) - /Hoc > e2^ s . 
l>l k 

Let N be a fixed integer, and let £' n > t n be a sequence of integers to be 
chosen later. Then for some I E [£' n , £' n -\- N — 1] 
d +jv— l 

sup|/3 rfc (/)|>i ]T su Pl^(/)l 




oo oo 



looiV 



X)2 l/2 sup|Afc(/)|- £ 2'/ 2 sup|A fe (/)| 



for some d(e , -B , ip , s) > if iV is chosen large enough but finite depending 
only on e,B,tp,s. From (2.4) we thus have, for any t>s, 

inf ||/-s||oc> mf sup 2 l ' 2 \p lk {f - g)\ 

SGE(t) 9^(t)l>£' n ,k 

>d(e,B^,s)2-^ s - sup sup 2 l ' 2 \p lk {g)\ 
geT.{t)l>e n ,k 

>d(e,B,i;,s)2^ s -B2~ e ^ 

>c(e,B,s,t,7p)2- e " s , 

where we have chosen t' n large enough depending only on B, s, t, d(e, B,ip,s) 
but still of order 0(£ n ). This completes the proof of the first claim. The 
second claim is immediate in view of the definitions. 

(b) Take f = f + 2-M* +1 /2)^ nm for some m . Then \\ f \\ s ^ <l so fe 

S(s,.B) and the estimate in the last display of the proof of part (a) implies 

inf ||/ -9\\oo>c2~ e - s >L' r n (s) 
se£(t) 
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for n large enough depending only on B,s,t,L' ,ip. On the other hand 
||^ n+ i(/)-/||oc = 0so/^E( £ , S) £ n ) foranye>0. □ 

Proof of Proposition 4. Using (2.4) we have 
\\Kj{U s ) - UsWoo > I^Hr 1 sup 2 l / 2 \(3 lk (U s )\ > l^llr 12 "^ max |u ifc |. 

l>j,k k=l,...,2i 

The variables Uj^/B are i.i.d. U(— 1,1) and so the U^s, U k := \ujk/B\, are 
i.i.d. U(0, 1) with maximum equal to the largest order statistic U^)- Deduce 

Pr(||i^([/ S ) - U.Woo < eB2-i s ) < Pr(C/ (2J) < e) = e 2J 
to complete the proof. □ 

Proof of Theorem 3. The proof is a modification of the "necessity 
part" of Theorem 2. Let us assume w.l.o.g. B > 2,R > 1, let us write, in 
slight abuse of notation, s n ,s' n for s,s' throughout this proof to highlight 
the dependence on it and choose jni^Sn 

) <E N such that 

(n/logn) 1/(2H+1) < c (n/ log n) 1/(2s " +1) < 2 jn( - s ^ < (n/logn) 1 /( 2s »+ 1 ) 

holds for some cq > 1/{2R + l) 1 /( 2fi+1 ) and every n large enough. We shall 
assume that £ is any fixed number satisfying 

(AR + 2)L 



C > (4i2 + 2) max log 2 ((4i2 + 2)B), (2R + 1) log ■ 



a 



in the rest of the proof, and we shall establish liminf n (/o n (s n )/Lr n (s^')) > 0, 
where s+ > s n is the larger "neighbor" of s n in S n . This completes the proof 
since liminf n r n (s^)/r , n (s n ) > c(() > by definition of the grid. 

Assume thus by way of contradiction that liminf n (p ri (s n )/Lr n (s+)) = 
so that, by passing to a subsequence of n if necessary, p n {s n ) < Lr n (s+) + 5 
for every 5 > and every n = n{8) large enough. Let e := 1/(2R + 1) and 
define 

fo = 1, fm = fo + e2~^ +1 ' 2 ^ jm , m = 1, . . . , M, 

as in the proof of Theorem 2, c' 2^ < M < 2?, c' > 0. Then f m G S(s n ) for 
every j > jo where jo can be taken to depend only on r,R,B,^. Moreover 
for j > (log n)/(AR + 2) we have, using (2.4) and the assumption on £, for 
any 5 G S(i), t G «S n , i > s n , and every m 

||/m-5||oo > sup 2'/ 2 |A fc (/ m ) -Afc(5)l 

Z>0,fc 

(3.1) > e2~ J ' s ' 1 - 2 J '/ 2 |/3 jfc ( 5 )j > e2~ jSn - B2~ jt 

> 2~ js "(e - B2- K/logn ) > -2~ js ". 
\ j - 2 
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We thus see that 

/„,6Ek^A) Vm=l,...,M, 

for every j > Jo := max(jo, (logn)/(4R + 2)). Take now j = j n (s n ) which 
exceeds Jo for n large enough, and conclude 

£ -2-^^ > £ -r n (s n ) > ^ V 2 ) 1/(2 * +1) Vn(4) 

(3.2) 2 2 2 L ° 

L 

> —r n (s+) > Pn(s n ) 

a 

for n large enough, where we have used the definition of the grid S n , of e, the 
assumption on £ and the hypothesis on p n . Summarizing f m € E(s n , p n (s n ),S n 
for every m = 1, . . . , M and every n > no, no = no(r, i?, B, ip). 

Suppose now C n is a confidence band that is adaptive and asymptotically 
honest over E(s' n )US(s n , Pn{s n ),S n ), and consider testing Hq: f = Jq against 
Hi ■ f € • • • , /a/} =: M.. Define a test ^ n as follows: if no f m € C n then 
^ = 0, but as soon as one of the / m 's is contained in C n then \t n = 1. Now 
since r n (s£J < r n (s+) and using (3.1), (3.2) we have 

Pr/o(*n 0) = Pr/ (/ m G C n for some m) 

< Pr /o (||/ m - /olloo < |C n | for some m) + a + o(l) 

< Pr/ (|C n | > (L/a)r n (s+)) + a + o(l) 

< ar n (4)/r n ,(4) + a + o(l) < 2a + o(l). 

Under any alternative / m € £(s n ), invoking honesty of the band we have 

Pf m (*n = 0) = Pr /m (no / fc € C n ) < Pr /m (/ m £ C n ) < a + o(l) 
so that summarizing we have 



limsup (£ /o * n + sup Ey(l - * n ) < 3a < 1. 

n. V feM ' 

But this has led to a contradiction by the same arguments as in the proof 
of Theorem 2, noting in the last step that wy 2 = £ 2 n2~ : > n( - Sn ^ 2Sn+1 ^ < (e 2 / 
(co) 2R+l ) log?i and thus 

e n ^-l^ 1 o(e 2 /(co)2 fl+i )loKn / / logn^ 1 



< c (eV(co)"--^)logn — & = o(l) 



M CqCo \ n 

since l/(2i? + 1) = e < c^ +1 . □ 
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Proof of Theorem 4. We shall only prove the more difficult case 
T = R. Let ji be such that 2 ji ~ (n/logn) 1/(2si+1) , let f n (j) be as in (2.2) 
based on wavelets of regularity S > R and define test statistics 

d n (i):= inf \\f n (ji) ~ g\\oo, i = 1, . . . , |S„| - 1. 

geS(s i+1 ) 

Recall further a(j) from (2.9) and, for a constant L to be chosen below, 
define tests 



0, if < La(ji), 

1, otherwise, 



to accept i?o : / € E(sj+i) against the alternative H\:f£ E n (sj). Starting 
from the largest model we first test Hq: f G S(s2) against H±:f G E n (r). 
If is rejected we set s n = r, otherwise we proceed to test ifo : / G E(s3) 
against H\: f G E ra (s2) and iterating this procedure downwards we define 
s„ to be the first element s, in S for which = 1 rejects. If no rejection 
occurs set s n = R. 

For / G V n (Lo) define s io := s io (f) = max{s G <S„ : / G E„(s)}. 

Lemma 1. VFe can choose the constants L and then Lq depending only 
on B,cfi,ip such that 

sup Pr f (s n ^s io (f))<Cn~ 2 
feV„(L ) 

for some constant C and every n large enough. 

Proof. If s n < Sj , then the test ^(i) has rejected for some i < In this 
case / G E n (sj ) C E(sj ) Q E(sj + i) for every i <iq, and thus, proceeding as 
in (2.10) and using Proposition 5 below, we have for L and then d large 
enough depending only on B,K 

Pv f (s n < s io ) = Pr/( IN inf \\f n (ji) - slloo > La(ji) \ 

< ]T Pr/dl/n^) - E f f n {ji)\\oc >(L~ b)a{ji)) 

i<io 

< C"|5 n |e- dlogn < Cn~ 2 . 

On the other hand if s n > Si (ignoring the trivial case Si = R), then 
^f(io) has accepted despite / G E n (sj ). Thus, using r n (sj ) > ca(ji ) for 
some c = c(-B) and proceeding as in (2.10) we can bound Piy(s n > Sj ) by 



Pr /( J> nf s 1 1/« C?*o ) - S'H 00 - Lc7 (^o) ) 

\g£Y,(s H)+1 ) / 
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<Pr/( inf l|/-»||oo-||/na<o)-^//n(jio)||e 
VgeS(s l() + i) 

- \\E f UUio) -f Woo <La(jio] 

< Pr f (L r n (s l0 ) - \\K jiQ (/) - /|U - La(j lQ ) < \\f n (j l0 ) - E f f n {jio)\\oo) 

< Pr /(ll/n(iio) - #//n(jio)l|oo > (cL — L — b)d{j io )) 

< c > e -c'n < c / n 2 

for Lq and then also c' > large enough, using Proposition 5 below. □ 

Take now f n to be an estimator of / that is adaptive in sup-norm loss 
over U s g[rR] ^( s ) as i n Theorem 5 below and define the confidence band 



n 

where M is chosen below. For / £ S n (sj ) the lemma implies 

/ log n \ Si o/( 2si o +1 ) / log n \ r /( 2r+1 ) 

E f \C n \<2M[^\ + 2M[^\ xPr f (s n <s i0 ) 

<C M) ^ 
\ n 

so this band is adaptive. 

For coverage, we have, again from the lemma and Markov's inequality 

P r/ (/ G C n ) = Pr/dl/n - /Hoc < Mr n (s n )) 

> 1 - Pr/dlA - /||oo > Mr n (si )) - Pr(s n > s io ) 

> _ -E'/ll/n ~ / Hop _ _C 

Mr n (s io ) n 2 

D(B,R,r) C 
M n 2 ' 

which is greater than or equal to 1 — a for M and n large enough depending 
only on B, R, r. □ 

Proof of part (c) of Theorem 2. The analog of case (b) is imme- 
diate. The analog of part (a) requires the following modifications: set again 
/o = 1 on [0, 1] , < j' n < j n to be chosen below, and define 

f m := / + J B2"^( s+1 /2)^, ^ + e2 -^ + i/2)^ 



ADAPTIVE INFERENCE 



25 



where m = 1, . . . , M ~ 2 3 , all V'Zfc's are Daubechies wavelets supported in the 
interior of [0, 1] and where mo ^ m is chosen such that ?/y mo and ipj n m have 
disjoint support for every m (which is possible for j n ,j' n large enough since 
Daubechies wavelets have localized support). Recalling j* from the proof of 
part (a), we can choose j' n ,j n in such a way that j' n < j n ,2~ JnV = o(2~ JnV ), 

f m e E(r, p n ) Vm, ft := f + B2-^' +1 ^ j(imo E £(*, (B/2)r„(a)) 

for every n > no, no = no(s,r,B,£,tl)). Now if C n is a confidence band that is 
adaptive and honest over £(s,r n (s)) U E(r, p n ) consider testing iJo '■ f = fb 
against iJx : / £ • • • ,/m} ='■ M.- The same arguments as before (2.7) 
show that there exists a test ^ n such that limsup n (£'j ^ , n + supj g _ A/( Ef(l — 
^n)) < 2a < 1 along a subsequence of n, a claim that leads to a contra- 
diction since we can lower bound the error probabilities of any test as in 
the original proof above, the only modification arising in the bound for the 
likelihood ratio. Let Pq be the n-fold product probability measure induced 
by the density ft and set Z = (1/M) J2m=i( dP m/dPo). We suppress now 
the dependence of j n on n for notational simplicity, and define shorthand 
jj = £ 2-i( r + 1 / 2 ), Kj = j B2-J"( s+1 /2). To bound E f , {Z - l) 2 we note that, us- 
ing orthonormality of the ipjm's, that J ipj m = and that i/jj' mo has disjoint 
support with ipj m , m = 1, . . . , M, we have (m ^ ml) 

rm . 



(1 + Kjlpj'mo? 

-fb= I ^'m = 0, 



1 + Kj1pji mQ 
V) 



2 

JO = / Vim = L 



(l + ^m ) 2JU J 

The identities in the last display can be used to bound E^(Z — l) 2 by 

M 



m 2 J [0>1]n ( £ ( n n 




\f'o(xi)dx 



ft(x)dx 



M« 1 + ^ M 
The rest of the proof is as in part (a) of Theorem 2. □ 
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3.1. Auxiliary results. The following theorem is due to [10, 11, 13]. We 
state a version that follows from Theorem 4 in [25] for T = R. In case T = 
[0, 1] it follows from the same proofs. The restriction that B be known is not 
necessary but suffices for our present purposes. 

Theorem 5. Let Xi,. . . ,X n be i.i.d. with uniformly continuous density 
f on T = [0, 1] or T = K. Then for every r, R, < r < R there exists an 
estimator f n (x) := f n (x, X±, . . . , X n , B, R) such that, for every s, r < s < R, 
some constant D(B,r,R) and every n > 2 we have supj eS ( s B T ^ E\\f n — 
f\\oo<D(B,r,R)r n (s). 

The following inequality was proved in [11] (see also page 1167 in [12]) 
for T = ~R (the case T = [0, 1] is similar, in fact simpler). 

Proposition 5. Let (p,ip be a compactly supported scaling and wavelet 
function, respectively, both S -Holder for some S > 0. Suppose P has a bounded 
density f and let f n ( x ,j) be the estimator from (2.2). Given C,C > 0, there 
exist finite positive constants C% = C\{C,K) and C2 = Ci{C-,C ,K) such 
that, if (n/2ij) > C and C\ \A||/||oo V l)(2Jj/n) < t < C , then, for every 
n £ N, 

Pr f { S nv\U(xJ)-Ef n ( xJ )\>t}< 
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